Prune index for `<package>-<version>` before updating by ruslandoga · Pull Request #61 · hexpm/hexdocs

ruslandoga · 2025-10-24T11:47:40Z

This should help resolve Elixir main and similar duplicates.

ericmj · 2025-10-24T12:52:38Z

Should we reprocess all packages to delete the duplicates?

ruslandoga · 2025-10-24T14:32:41Z

Should we reprocess all packages to delete the duplicates?

If it's not too hard -- yes, and probably only for the special_packages (but I am not sure).

^{Alternatively the duplicates could be removed with a "script" that would go through all current special packages docs and deduplicate each ref by picking the doc with latest id but it might leave some stale docs and I'd rather avoid doing that :)}

josevalim · 2025-10-24T15:49:17Z

Should we reprocess all packages to delete the duplicates?

No need IMO. If there are duplicates, as soon as the maintainer does mix hex.publish docs the duplicates will be removed. But it may not hurt either. Your call.

josevalim · 2025-10-24T15:50:23Z

For example, once this is merged, the duplicates for Elixir will automatically be fixed once we commit to main/v1.19. So we should be fine.

ruslandoga · 2025-10-24T16:47:24Z

I also see elixir-1.18 (and iex, ex_unit, etc.) in the index. Should I delete them since there is elixir-1.18.4 already? And I can re-insert elixir-1.18.4 docs as elixir-1.18 afterwards too.

ericmj · 2025-10-24T18:11:15Z

yes, and probably only for the special_packages (but I am not sure).

It can also happen for conventional packages since docs can be republished at any time. Docs don't have the same limitations as republishing packages.

ericmj · 2025-10-24T18:15:13Z

I will reprocess all packages this weeked, it will also be useful to find if we created any new regressions in the pipeline.

ruslandoga · 2025-10-24T19:53:17Z

it will also be useful to find if we created any new regressions in the pipeline.

One thing I noticed about delete operations is that they can be very slow (several minutes) for queries affecting a lot of docs -- like deleting elixir-main -- so they could probably hit timeouts in the HTTP client or maybe even SQS visibility timeouts. But maybe once the duplicates are handled, deleting would get faster.

josevalim · 2025-10-24T20:16:35Z

Maybe we should do it for elixir-main explicitly then? Cause it will probably take forever with all duplicates!

ruslandoga · 2025-10-24T20:44:13Z

Cause it will probably take forever with all duplicates!

Yes, elixir-main was an outlier. Deleting others would probably finish faster.

Maybe we should do it for elixir-main explicitly then?

I have already deleted it (but it has been re-indexed with fewer or no duplicates since) ... that's how I found out it was slow ...

These were the counts a few days ago (pre-delete):

select package, count(*), count(*) / (select count(*) from 'documents-export-hexdocs-prod-10-21-2025--7-38-08-PM.jsonl') * 100 as '%' from 'documents-export-hexdocs-prod-10-21-2025--7-38-08-PM.jsonl' group by package order by 2 desc limit 40;

┌───────────────────┬──────────────┬─────────────────────┐
│      package      │ count_star() │          %          │
│      varchar      │    int64     │       double        │
├───────────────────┼──────────────┼─────────────────────┤
│ elixir-main       │      2181458 │  17.354231789380247 │
│ elixir-1.18       │       349102 │   2.777223777004289 │
│ elixir-1.19       │       286303 │  2.2776366191762265 │
│ mix-main          │       237975 │  1.8931711314532595 │
│ ex_unit-main      │       102540 │  0.8157401736283947 │
│ iex-main          │        71159 │  0.5660937684340056 │
│ logger-main       │        50615 │ 0.40265934160523886 │
│ mix-1.18          │        38226 │ 0.30410068146205393 │
│ mix-1.19          │        31065 │   0.247132518956174 │
│ docusign-2.0.0    │        26054 │ 0.20726832927359268 │
│ ash-3.4.69        │        25335 │ 0.20154844254803372 │
│ evision-0.2.1     │        25289 │ 0.20118249708297709 │
│ evision-0.2.0     │        25287 │ 0.20116658641058335 │
│ evision-0.1.39    │        25258 │ 0.20093588166087373 │
│ evision-0.1.38    │        25195 │ 0.20043469548047008 │
│ evision-0.1.37    │        25193 │ 0.20041878480807634 │
│ evision-0.1.35    │        25169 │ 0.20022785673935112 │
│ evision-0.1.36    │        25155 │ 0.20011648203259474 │
│ evision-0.1.34    │        25007 │ 0.19893909227545603 │
│ evision-0.1.33    │        25005 │  0.1989231816030623 │
│ evision-0.1.32    │        25002 │ 0.19889931559447163 │
│ eex-main          │        24722 │  0.1966718214593444 │
│ ash-3.5.26        │        24014 │ 0.19103944343195112 │
│ ash-3.4.56        │        22836 │ 0.18166805739202282 │
│ ash-3.5.12        │        22282 │ 0.17726080113894957 │
│ ash-3.4.62        │        20273 │  0.1612785307194114 │
│ aws-1.0.0         │        17118 │ 0.13617944501824517 │
│ raxol-0.5.0       │        16820 │   0.133808754831574 │
│ aws-0.14.1        │        16696 │  0.1328222931431605 │
│ ex_unit-1.18      │        16468 │ 0.13100847649027114 │
│ evision-0.2.14    │        16342 │ 0.13000610412946387 │
│ evision-0.2.13    │        16340 │  0.1299901934570701 │
│ evision-0.2.12    │        16338 │ 0.12997428278467635 │
│ evision-0.2.11    │        16336 │ 0.12995837211228256 │
│ signet-1.2.3      │        16174 │ 0.12866960764838753 │
│ evision-0.2.9     │        16137 │ 0.12837526020910284 │
│ evision-0.2.8     │        16135 │ 0.12835934953670908 │
│ ash-3.4.60        │        16028 │ 0.12750812856364258 │
│ aws_erlang-1.0.4  │        15651 │ 0.12450896681741763 │
│ procore_sdk-0.3.0 │        15573 │ 0.12388845059406076 │
├───────────────────┴──────────────┴─────────────────────┤
│ 40 rows                                      3 columns │
└────────────────────────────────────────────────────────┘

ruslandoga added 2 commits October 24, 2025 14:46

delete first

3a8b63b

allow indexing Elixir main and co

aaeb352

ruslandoga marked this pull request as ready for review October 24, 2025 12:03

use elixir main in test

d9ebf24

ericmj approved these changes Oct 24, 2025

View reviewed changes

josevalim approved these changes Oct 24, 2025

View reviewed changes

ericmj merged commit 8328400 into hexpm:main Oct 24, 2025
2 checks passed

ruslandoga deleted the delete-first branch October 24, 2025 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Prune index for `<package>-<version>` before updating#61

Prune index for `<package>-<version>` before updating#61
ericmj merged 3 commits intohexpm:mainfrom
ruslandoga:delete-first

ruslandoga commented Oct 24, 2025

Uh oh!

ericmj commented Oct 24, 2025

Uh oh!

ruslandoga commented Oct 24, 2025 •

edited

Loading

Uh oh!

josevalim commented Oct 24, 2025

Uh oh!

josevalim commented Oct 24, 2025

Uh oh!

ruslandoga commented Oct 24, 2025 •

edited

Loading

Uh oh!

ericmj commented Oct 24, 2025

Uh oh!

Uh oh!

ericmj commented Oct 24, 2025

Uh oh!

ruslandoga commented Oct 24, 2025 •

edited

Loading

Uh oh!

josevalim commented Oct 24, 2025

Uh oh!

ruslandoga commented Oct 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

ruslandoga commented Oct 24, 2025

Uh oh!

ericmj commented Oct 24, 2025

Uh oh!

ruslandoga commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Oct 24, 2025

Uh oh!

josevalim commented Oct 24, 2025

Uh oh!

ruslandoga commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericmj commented Oct 24, 2025

Uh oh!

Uh oh!

ericmj commented Oct 24, 2025

Uh oh!

ruslandoga commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Oct 24, 2025

Uh oh!

ruslandoga commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ruslandoga commented Oct 24, 2025 •

edited

Loading

ruslandoga commented Oct 24, 2025 •

edited

Loading

ruslandoga commented Oct 24, 2025 •

edited

Loading

ruslandoga commented Oct 24, 2025 •

edited

Loading